Skip to content

geotiff: bundle remaining ambiguous-metadata fail-closed checks (#1987)#2030

Merged
brendancol merged 2 commits into
1987-pr1-conflicting-crs-writefrom
1987-pr2to7-remaining-fail-closed
May 18, 2026
Merged

geotiff: bundle remaining ambiguous-metadata fail-closed checks (#1987)#2030
brendancol merged 2 commits into
1987-pr1-conflicting-crs-writefrom
1987-pr2to7-remaining-fail-closed

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Bundles four of the five remaining #1987 slices: UnparseableCRSError, RotatedTransformError, NonUniformCoordsError, and ConflictingNodataError. ConflictingCRSError already merged via #2021. MixedBandMetadataError is deferred to a follow-up so the ~35 existing VRT test fixtures can migrate to the new band_nodata='first' opt-out in their own commit.

Builds on

This branch was authored on top of 1987-pr1-conflicting-crs-write so the two stay clean together. Easiest if #2021 lands first.

Active checks added

UnparseableCRSError (#1987 PR 2)

  • Write: re-typed the existing _validate_crs_fallback raise from plain ValueError to UnparseableCRSError (subclass; no behaviour change).
  • Read: new _check_read_unparseable_crs that runs pyproj.CRS.from_user_input on crs_wkt. Tolerates EPSG:4326-style placeholders (the GDAL <SRS> convention).
  • Opt-out: allow_unparseable_crs=True kwarg, threaded through open_geotiff / read_geotiff_dask / read_geotiff_gpu / read_vrt.

RotatedTransformError (#1987 PR 3)

  • Read: rejects affine transforms with non-zero b / d terms. Downstream xrspatial ops assume axis-aligned grids and silently produce wrong results on a rotated grid.
  • Opt-out: allow_rotated=True, same threading.

NonUniformCoordsError (#1987 PR 4)

ConflictingNodataError (#1987 PR 7)

  • Write: rejects when attrs['nodata'] disagrees with every concrete entry in attrs['nodatavals']. None and NaN entries in the rioxarray tuple are skipped. NaN canonical + concrete tuple entry also raises (contradictory sentinels).
  • Opt-out: explicit nodata= writer kwarg overrides both attrs.

Shared infrastructure

  • _attrs.py gains _validate_read_geo_info(...), a one-call wrapper used from all four read backends so each backend has a single check call site.
  • Each writer entry point now builds a context dict carrying crs_kwarg / attrs_crs / attrs_crs_wkt / nodata_kwarg / attrs_nodata / attrs_nodatavals / coord_y / coord_x and feeds it to validate_write_metadata.

Deferred: MixedBandMetadataError (#1987 PR 5)

The check function and constants are in _validation.py but the registration is commented out. The band_nodata= kwarg is already threaded through read_vrt / _read_vrt_chunked so the follow-up is a one-line registration plus the test sweep over ~35 fixtures.

Test updates

File Change
test_attrs_contract_aliases_1984.py::test_canonical_nodata_wins_over_aliases Split into a resolver-layer test (canonical still wins at the read chokepoint) and a write-layer test (now asserts the fail-closed raise + the nodata= kwarg opt-out).
test_nodata_attr_aliases_1582.py::test_explicit_nodata_attr_wins_over_aliases Updated to expect ConflictingNodataError plus the kwarg-bypass path.
test_reader_kwarg_order_1935.py New kwargs added to the canonical order; band_nodata allowed in read_vrt's tail; gpu alias moved back to last on read_geotiff_gpu.
test_ambiguous_metadata_hooks_1987.py Framework dispatch tests now use an opaque payload key (_dispatch_probe) instead of the crs_wkt / MALFORMED placeholders the new check would refuse.
test_remaining_fail_closed_1987.py New file. 19 tests across the four active checks plus the read-then-write round-trip.

Test plan

  • pytest xrspatial/geotiff/tests/ -k "not gpu and not cuda" — 3013 passed (+19 new tests vs pre-PR baseline).
  • pytest xrspatial/geotiff/tests/test_remaining_fail_closed_1987.py -v — 19 passed.

Refs #1987.

Bundles four of the five remaining #1987 slices (PRs 2, 3, 4, 7 in the
issue's numbering). PR 6 (``ConflictingCRSError``) landed earlier as
this branch's parent. PR 5 (``MixedBandMetadataError``) is intentionally
deferred to a follow-up that also migrates ~35 existing VRT test
fixtures via the new ``band_nodata='first'`` opt-out kwarg.

Active checks added in this PR
------------------------------

* ``UnparseableCRSError`` (#1987 PR 2):
  - Write: typed the existing ``_validate_crs_fallback`` raise from
    plain ``ValueError`` to ``UnparseableCRSError``. No behaviour
    change (subclass relationship).
  - Read: new ``_check_read_unparseable_crs`` that runs ``pyproj.CRS.
    from_user_input`` on ``crs_wkt`` and raises if pyproj cannot parse.
    Tolerates pyproj-parseable placeholders like ``"EPSG:4326"`` that
    the GDAL VRT ``<SRS>`` convention stashes into ``crs_wkt``.
  - Opt-out: ``allow_unparseable_crs=True`` kwarg, threaded through
    ``open_geotiff`` / ``read_geotiff_dask`` / ``read_geotiff_gpu`` /
    ``read_vrt``.

* ``RotatedTransformError`` (#1987 PR 3):
  - Read: new ``_check_read_rotated_transform`` that rejects affine
    transforms with non-zero b / d terms (rasterio Affine ordering).
    Downstream xrspatial ops (slope, aspect, hillshade, proximity,
    zonal) assume axis-aligned grids; a rotated transform silently
    produced wrong results.
  - Opt-out: ``allow_rotated=True`` kwarg, same threading as
    ``allow_unparseable_crs``.

* ``NonUniformCoordsError`` (#1987 PR 4):
  - Write: new ``_check_write_non_uniform_coords`` that diffs the
    ``y`` and ``x`` coord arrays against the first step and rejects
    when relative drift exceeds 1e-5 (mirrors the existing #1720
    coord-regularity tolerance). The int-dtype sentinel from #1969 is
    exempted (the no-georef fallback uses 0..N-1 ints which the writer
    treats specially).
  - No new kwarg: the fix is to resample, not to opt out.

* ``ConflictingNodataError`` (#1987 PR 7):
  - Write: new ``_check_write_conflicting_nodata`` that refuses when
    ``attrs['nodata']`` disagrees with every concrete entry in
    ``attrs['nodatavals']``. ``None`` and NaN entries in the rioxarray
    tuple are skipped (same convention as ``_resolve_nodata_attr``).
    NaN as the canonical value paired with a concrete numeric in the
    tuple also raises -- "NaN is the sentinel" and "X is the sentinel"
    contradict.
  - Opt-out: explicit ``nodata=`` writer kwarg overrides both attrs
    and bypasses the check.

Shared infrastructure
---------------------

* ``_attrs.py`` gains ``_validate_read_geo_info(geo_info, *, window,
  allow_rotated, allow_unparseable_crs)`` -- one helper called from
  the four read backends so the check site is uniform.
* Each writer entry point (``to_geotiff``, ``_write_vrt_tiled``,
  ``write_geotiff_gpu``) now builds a context dict carrying
  ``crs_kwarg`` / ``attrs_crs`` / ``attrs_crs_wkt`` /
  ``nodata_kwarg`` / ``attrs_nodata`` / ``attrs_nodatavals`` /
  ``coord_y`` / ``coord_x`` and feeds it to ``validate_write_metadata``.

Deferred: MixedBandMetadataError (#1987 PR 5)
---------------------------------------------

The mixed-band check function and its constants are defined in
``_validation.py`` but the registration call is commented out. The
``band_nodata=`` kwarg is threaded through both ``read_vrt`` and
``_read_vrt_chunked`` so the follow-up PR is a one-line registration
plus the test-fixture migration. About 35 existing VRT tests would
need to opt in to ``band_nodata='first'`` to keep their legacy
assertions, which is its own commit.

Test updates
------------

* ``test_attrs_contract_aliases_1984.py::test_canonical_nodata_wins_over_aliases``
  split into a resolver-layer test (still asserts canonical wins) and a
  write-layer test (now asserts ``ConflictingNodataError``).
* ``test_nodata_attr_aliases_1582.py::test_explicit_nodata_attr_wins_over_aliases``
  updated to expect ``ConflictingNodataError`` on the disagreement and to
  show the ``nodata=`` kwarg opt-out.
* ``test_reader_kwarg_order_1935.py`` updated: the new
  ``allow_rotated`` / ``allow_unparseable_crs`` kwargs join the
  canonical order; ``band_nodata`` goes in ``read_vrt``'s
  ``allowed_tail`` since it is VRT-specific; ``read_geotiff_gpu``'s
  deprecated ``gpu`` alias moved back to the tail.
* ``test_ambiguous_metadata_hooks_1987.py`` framework tests now use an
  opaque ``_dispatch_probe`` payload key instead of ``"crs_wkt":
  "EPSG:4326"`` / ``"MALFORMED"`` placeholders that the newly registered
  ``_check_read_unparseable_crs`` would refuse.
* ``test_remaining_fail_closed_1987.py`` -- 19 new tests covering each
  active check plus the round-trip read-write contract.

Verification
------------

- ``pytest xrspatial/geotiff/tests/ -k 'not gpu and not cuda'`` --
  3013 passed (was 2994 pre-PR, +19 new).
- ``pytest xrspatial/geotiff/tests/test_remaining_fail_closed_1987.py
  -v`` -- 19 passed.

Refs #1987.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026
- vrt.py: eager read_vrt now parses the VRT XML and runs the #1987
  read-side validators *before* _read_vrt_internal materialises the
  mosaic, so a rejected file fails fast instead of loading the full
  array into host memory first. The parsed VRTDataset is threaded
  through via the existing `parsed=` kwarg so the XML is parsed only
  once.
- _validation.py: extracted `_gdal_geotransform_to_affine_tuple` so
  the eager and chunked VRT call sites share one GDAL->rasterio
  reorder helper instead of duplicating the index shuffle.
- _attrs.py: documented in `_validate_read_geo_info` that the
  built transform is always axis-aligned (rotated TIFFs raise
  NotImplementedError upstream in `_geotags`), so the rotated check
  fires only on the VRT path.
- tests: added direct coverage for the row-axis rotation branch
  (GDAL GT[4] non-zero) and for the zero-step / constant-float-coord
  branch in `_check_write_non_uniform_coords`. Refactored the
  rotated-VRT fixture to take a `geo_transform` kwarg so both
  rotation axes share the same builder.
@brendancol brendancol merged commit ae3d08a into 1987-pr1-conflicting-crs-write May 18, 2026
4 of 5 checks passed
brendancol added a commit that referenced this pull request May 18, 2026
… (#2030) (#2031)

* geotiff: bundle remaining ambiguous-metadata fail-closed checks (#1987)

Bundles four of the five remaining #1987 slices (PRs 2, 3, 4, 7 in the
issue's numbering). PR 6 (``ConflictingCRSError``) landed earlier as
this branch's parent. PR 5 (``MixedBandMetadataError``) is intentionally
deferred to a follow-up that also migrates ~35 existing VRT test
fixtures via the new ``band_nodata='first'`` opt-out kwarg.

Active checks added in this PR
------------------------------

* ``UnparseableCRSError`` (#1987 PR 2):
  - Write: typed the existing ``_validate_crs_fallback`` raise from
    plain ``ValueError`` to ``UnparseableCRSError``. No behaviour
    change (subclass relationship).
  - Read: new ``_check_read_unparseable_crs`` that runs ``pyproj.CRS.
    from_user_input`` on ``crs_wkt`` and raises if pyproj cannot parse.
    Tolerates pyproj-parseable placeholders like ``"EPSG:4326"`` that
    the GDAL VRT ``<SRS>`` convention stashes into ``crs_wkt``.
  - Opt-out: ``allow_unparseable_crs=True`` kwarg, threaded through
    ``open_geotiff`` / ``read_geotiff_dask`` / ``read_geotiff_gpu`` /
    ``read_vrt``.

* ``RotatedTransformError`` (#1987 PR 3):
  - Read: new ``_check_read_rotated_transform`` that rejects affine
    transforms with non-zero b / d terms (rasterio Affine ordering).
    Downstream xrspatial ops (slope, aspect, hillshade, proximity,
    zonal) assume axis-aligned grids; a rotated transform silently
    produced wrong results.
  - Opt-out: ``allow_rotated=True`` kwarg, same threading as
    ``allow_unparseable_crs``.

* ``NonUniformCoordsError`` (#1987 PR 4):
  - Write: new ``_check_write_non_uniform_coords`` that diffs the
    ``y`` and ``x`` coord arrays against the first step and rejects
    when relative drift exceeds 1e-5 (mirrors the existing #1720
    coord-regularity tolerance). The int-dtype sentinel from #1969 is
    exempted (the no-georef fallback uses 0..N-1 ints which the writer
    treats specially).
  - No new kwarg: the fix is to resample, not to opt out.

* ``ConflictingNodataError`` (#1987 PR 7):
  - Write: new ``_check_write_conflicting_nodata`` that refuses when
    ``attrs['nodata']`` disagrees with every concrete entry in
    ``attrs['nodatavals']``. ``None`` and NaN entries in the rioxarray
    tuple are skipped (same convention as ``_resolve_nodata_attr``).
    NaN as the canonical value paired with a concrete numeric in the
    tuple also raises -- "NaN is the sentinel" and "X is the sentinel"
    contradict.
  - Opt-out: explicit ``nodata=`` writer kwarg overrides both attrs
    and bypasses the check.

Shared infrastructure
---------------------

* ``_attrs.py`` gains ``_validate_read_geo_info(geo_info, *, window,
  allow_rotated, allow_unparseable_crs)`` -- one helper called from
  the four read backends so the check site is uniform.
* Each writer entry point (``to_geotiff``, ``_write_vrt_tiled``,
  ``write_geotiff_gpu``) now builds a context dict carrying
  ``crs_kwarg`` / ``attrs_crs`` / ``attrs_crs_wkt`` /
  ``nodata_kwarg`` / ``attrs_nodata`` / ``attrs_nodatavals`` /
  ``coord_y`` / ``coord_x`` and feeds it to ``validate_write_metadata``.

Deferred: MixedBandMetadataError (#1987 PR 5)
---------------------------------------------

The mixed-band check function and its constants are defined in
``_validation.py`` but the registration call is commented out. The
``band_nodata=`` kwarg is threaded through both ``read_vrt`` and
``_read_vrt_chunked`` so the follow-up PR is a one-line registration
plus the test-fixture migration. About 35 existing VRT tests would
need to opt in to ``band_nodata='first'`` to keep their legacy
assertions, which is its own commit.

Test updates
------------

* ``test_attrs_contract_aliases_1984.py::test_canonical_nodata_wins_over_aliases``
  split into a resolver-layer test (still asserts canonical wins) and a
  write-layer test (now asserts ``ConflictingNodataError``).
* ``test_nodata_attr_aliases_1582.py::test_explicit_nodata_attr_wins_over_aliases``
  updated to expect ``ConflictingNodataError`` on the disagreement and to
  show the ``nodata=`` kwarg opt-out.
* ``test_reader_kwarg_order_1935.py`` updated: the new
  ``allow_rotated`` / ``allow_unparseable_crs`` kwargs join the
  canonical order; ``band_nodata`` goes in ``read_vrt``'s
  ``allowed_tail`` since it is VRT-specific; ``read_geotiff_gpu``'s
  deprecated ``gpu`` alias moved back to the tail.
* ``test_ambiguous_metadata_hooks_1987.py`` framework tests now use an
  opaque ``_dispatch_probe`` payload key instead of ``"crs_wkt":
  "EPSG:4326"`` / ``"MALFORMED"`` placeholders that the newly registered
  ``_check_read_unparseable_crs`` would refuse.
* ``test_remaining_fail_closed_1987.py`` -- 19 new tests covering each
  active check plus the round-trip read-write contract.

Verification
------------

- ``pytest xrspatial/geotiff/tests/ -k 'not gpu and not cuda'`` --
  3013 passed (was 2994 pre-PR, +19 new).
- ``pytest xrspatial/geotiff/tests/test_remaining_fail_closed_1987.py
  -v`` -- 19 passed.

Refs #1987.

* geotiff: address review suggestions and nits on #1987 fail-closed bundle

- vrt.py: eager read_vrt now parses the VRT XML and runs the #1987
  read-side validators *before* _read_vrt_internal materialises the
  mosaic, so a rejected file fails fast instead of loading the full
  array into host memory first. The parsed VRTDataset is threaded
  through via the existing `parsed=` kwarg so the XML is parsed only
  once.
- _validation.py: extracted `_gdal_geotransform_to_affine_tuple` so
  the eager and chunked VRT call sites share one GDAL->rasterio
  reorder helper instead of duplicating the index shuffle.
- _attrs.py: documented in `_validate_read_geo_info` that the
  built transform is always axis-aligned (rotated TIFFs raise
  NotImplementedError upstream in `_geotags`), so the rotated check
  fires only on the VRT path.
- tests: added direct coverage for the row-axis rotation branch
  (GDAL GT[4] non-zero) and for the zero-step / constant-float-coord
  branch in `_check_write_non_uniform_coords`. Refactored the
  rotated-VRT fixture to take a `geo_transform` kwarg so both
  rotation axes share the same builder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant